Documentation Index
Fetch the complete documentation index at: https://mintlify.com/FrankDevg/imbd_scrapper_project/llms.txt
Use this file to discover all available pages before exploring further.
What is Clean Architecture?
Clean Architecture, introduced by Robert C. Martin (Uncle Bob), is a software design philosophy that emphasizes separation of concerns and independence from frameworks, UI, and databases.Core Principles
Independence of Frameworks
The architecture doesn’t depend on the existence of libraries or frameworks
Testability
Business rules can be tested without UI, database, or external services
Independence of UI
The UI can change without changing the rest of the system
Independence of Database
Business rules aren’t bound to a specific database
The Dependency Rule
The fundamental rule of Clean Architecture:- Domain (innermost): Pure business logic, no dependencies
- Application: Use cases, depends only on domain
- Infrastructure & Presentation (outermost): Technical details, depend on domain abstractions
How It’s Applied in This Project
1. Domain Layer: The Core
The domain layer contains entities with embedded business rules and validation:domain/models/movie.py
Notice how the
Movie entity has zero dependencies on frameworks or infrastructure. It only knows about business rules.2. Domain Interfaces: Contracts
Interfaces define contracts without implementation details:domain/repositories/movie_repository.py
domain/interfaces/scraper_interface.py
3. Application Layer: Use Cases
Use cases orchestrate business logic by depending on domain interfaces:application/use_cases/composite_save_movie_with_actors_use_case.py
The use case depends on
UseCaseInterface (abstraction), not on concrete implementations. This follows the Dependency Inversion Principle.4. Infrastructure Layer: Implementations
Concrete implementations live in infrastructure and depend on domain interfaces:infrastructure/persistence/csv/repositories/movie_csv_repository.py
infrastructure/persistence/postgres/repositories/movie_postgres_repository.py
5. Presentation Layer: Entry Point
The CLI delegates to the application layer:presentation/cli/run_scraper.py
Separation of Concerns
What Goes Where?
Domain Layer
Domain Layer
- Entities:
Movie,Actor,MovieActor - Value Objects: Could include
ImdbId,Rating, etc. - Repository Interfaces:
MovieRepository,ActorRepository - Service Interfaces:
ScraperInterface,ProxyProviderInterface - Business Rules: Validation logic in entity
__post_init__
Application Layer
Application Layer
- Use Cases:
SaveMovieWithActorsCsvUseCase,CompositeSaveMovieWithActorsUseCase - Application Services: Orchestration logic
- DTOs (if needed): Data Transfer Objects for cross-boundary communication
Infrastructure Layer
Infrastructure Layer
- Repository Implementations:
MovieCsvRepository,MoviePostgresRepository - Scraper Implementations:
ImdbScraper - Network Services:
ProxyProvider,TorRotator - Database Connections:
PostgresConnection - Dependency Container:
DependencyContainer
Presentation Layer
Presentation Layer
- CLI:
run_scraper.py - API Controllers (if added): REST endpoints
- View Models (if needed): UI-specific data structures
Testability Benefits
Testing Domain Logic
Testing Use Cases with Mocks
Testing Infrastructure
Clean Architecture enables isolated unit tests for domain and application layers, and integration tests for infrastructure.
Benefits in Practice
1. Swap Implementations Easily
Change from CSV to PostgreSQL without touching business logic:2. Add New Features Without Breaking Existing Code
Add Playwright scraper alongside existing requests-based scraper:3. Test Business Logic Without Infrastructure
Domain entities can be tested instantly without databases or network calls.Common Pitfalls to Avoid
Do follow best practices:
- ✅ Domain defines interfaces, infrastructure implements them
- ✅ Use dependency injection to wire components
- ✅ Keep business logic in domain and application layers
- ✅ Make infrastructure and presentation thin adapters
Real-World Impact
This architecture has enabled:- Hybrid Persistence: Simultaneously save to CSV and PostgreSQL
- Network Resilience: Easily integrate VPN, proxies, and TOR
- Future-Proof: Add Playwright without rewriting business logic
- Maintainability: Clear boundaries make code easy to understand
- Testability: 90%+ code coverage without complex mocking
Further Reading
Domain Models
Explore entity validation and business rules
Dependency Injection
Learn how components are wired together
Persistence Layer
Deep dive into data persistence
Use Cases
Understand application workflows